Prompt Engineering: Value from Generative AI

Generative AI or GenAI is viewed in the same breath as some of the world’s most disruptive technological advancements.

But just as the steam engine needed a human to drive it and shovel coal into the furnace, generative AI needs prompt engineers to elicit the right outputs.

What is a prompt engineer?

Prompt engineering refers to the art and science of crafting precise instructions or prompts to get the desired outputs from generative AI models. This is crucial as depending on how someone writes a query, the results will differ dramatically. A prompt engineer is needed to carefully design and shape the behavior of language models, like ChatGPT, to ensure any interactions are reliable—this is important for controlling output quality, improving response relevance, avoiding bias and enhancing user experience when interacting with these next-gen AI models.

The role of the prompt engineer has been around for some time, but today, thanks to the rise of generative AI, it’s now one of the most sought-after technology jobs on the planet, often with a six-figure salary.

Speaking to the World Economic Forum at Davos 2023, Professor Erik Brynjolfsson, Director of the Digital Economy Lab at Stanford University, said: “Right now, it would be downright dangerous to use [generative AI programs] without having a human in the loop, but I think even going forward we are going to develop a new job, the job of prompt engineering.”

He continued: “You will all be hearing about it soon. Prompt engineering is the idea that when you work with one of these large language models, you can write different kinds of queries and it turns out that depending how you write the query, you get dramatically different results.

“Even the inventors of these technologies are surprised at some of the things you can get them to do if you ask the question the right way.”

Prompt engineering techniques

Prompt engineering has rapidly evolved into a distinct discipline in AI. Today, researchers and practitioners have developed sophisticated techniques and methodologies to optimize prompts to produce the desired outputs that are unbiased.

Several techniques have been adopted, including fine-tuning, zero-shot, one-shot and chain of thought (CoT) prompting.

1. Fine-tuning

Fine-tuning can improve the performance and relevance of an AI model’s responses. To do this, language models that have already been trained on large datasets are fine-tuned with specific tasks or domains. This enables engineers to adapt models so that they can respond to particular prompts in different contexts.

“Fundamentally, prompt engineering is about getting the model to do what you want at inference time by providing enough context, instruction and examples without changing the underlying weights. Fine-tuning, on the other hand, is about doing the same thing, but by directly updating the model parameters using a dataset that captures the distribution of tasks you want it to accomplish,” writes Niels Bantilan, Chief ML Engineer at Union.ai, in a recent blog.

2. Zero-shot

An emerging technique, zero-shot prompting is the process of creating a prompt that the AI model can perform or answer, without any previous training data. Zero-shot prompting is where you leave the judgement to the LLM, assuming it is already trained and ready to answer your specific question. Most of the examples I have seen for zero-shot are qualitative.

To understand why this is useful, Founding Researcher at Machine Learning Mastery, writes: “Imagine the case of sentiment analysis: You can take paragraphs of different opinions and label them with a sentiment classification. Then you can train a machine learning model (e.g., RNN on text data) to take a paragraph as input and generate classification as output. But you would find that such a model is not adaptive. If you add a new class to the classification or ask not to classify the paragraph but summarize them, this model must be modified and retrained.

“A large language model, however, needs not to be retrained. You can ask the model to classify a paragraph or summarize it if you know how to ask correctly. This means the model probably cannot classify a paragraph into categories A or B since the meaning of “A” and “B” are unclear. Still, it can classify into “positive sentiment” or “negative sentiment” since the model knows what should “positive” and “negative” should be. This works because, during the training, the model learned the meaning of these words and acquired the ability to follow simple instructions.”

In simple terms, users can ask the algorithm to classify the following text into neutral, negative or positive. Text: Raja is planning to go on a vacation Sentiment: Neutral <Output>

Moving forward, there are risks in leaving the LLM to decide things, but advancements in zero-shot capabilities could enable AI models to provide outputs on various tasks with minimal data, which is useful given the scarcity of good quality data.

3. One-shot

One-shot prompting is where you teach the LLM to do something in one step. This is slightly better than zero-shot, where you leave the LLM to its own devices.

An example would be: “Kootu” is a traditional South Indian side dish. It is a dish where you add all the vegetables you have and cook it to a consistency close to gravy. You eat it by mixing it with rice or you can have it with the pancake version of South India, the Dosa. An example of a sentence that uses the word “Kootu” is: I had five vegetables available in my fridge and I decided to do “Kootu” today.

“Shaadi” is wedding in many Indian languages. Give me an example of a sentence using “Shaadi”. <Output>: My best friend celebrated his Shaadi today and we had a great time there.

4. Chain of thought (CoT)

Another method is the Chain of thought (CoT) prompting, which enables complex reasoning capabilities by breaking the problem into steps.

For example, Raja went to the market and got 4 boxes of tomatoes. Each box has 10 tomatoes. Raja already had 4 tomatoes in his fridge. How many tomatoes does Raja have in total?

Here the LLM might get confused between the box and tomatoes and return an answer of 80 (thinking there are 4 boxes in the fridge). But that is the wrong answer. This is where users can ask the LLM to think step-by-step so that they get the right answer.

Now, the LLM would output the following: Raja got 4 boxes. Each box has 10 tomatoes. So, Raja brought 4 x 10 = 40 tomatoes home. He already had 4 tomatoes in the fridge. Therefore, now he has 40 + 4 = 44 tomatoes.

Which is the right answer.

Separating good and bad prompts: Specificity and context

Often, what separates good and bad prompts is detail, specificity and above all, context.

To illustrate the difference, we asked ChatGPT itself for some examples of good and bad prompts that demonstrate how they can influence the quality of answers generated.

Example 1: Topic-specific question

Good prompt: Can you provide an overview of the greenhouse effect and its impact on climate change?

Bad prompt: Tell me about the weather?

According to the algorithm, the good prompt’s question is very specific and provides clear context, which enables ChatGPT to create a well-informed response. The bad prompt, on the other hand, lacks specificity and a clear direction, which will lead to a very general or unrelated response.

Example 2: Contextualizing the response

Good prompt: In the movie ‘Inception,’ what is the main character’s objective?

Bad prompt: What happens in ‘Inception’?

Here, the good prompt sets the context of the question by mentioning the movie and asks a very specific question within this context, which will elicit an accurate response. The bad prompt lacks specificity and prompts a more general response.

Example 3: Avoiding leading questions

Good prompt: What are the potential benefits and drawbacks of using renewable energy sources?

Bad prompt: Renewable energy sources are great, aren’t they?

In this example, the good prompt presents an open-ended question that encourages a balanced response, which will discuss the positives and negatives of renewable energy sources to form an unbiased answer. The bad prompt contains a leading statement, which will create bias—in this case leaning towards the positive of renewable energy, while disregarding the negatives.

These examples are straightforward, but highlight the difference between well-crafted prompts that are more specific, contain contextual information and are open-ended and bad prompts, which lack specificity, context and contain leading statements.

In a more complex situation, for example using generative AI to generate code or build business applications from scratch, well-crafted prompts will need extremely granular detail to produce the correct outputs. This will require the presence of prompt engineers with deep contextual knowledge of the specific project.

Powering reimagined experiences for E.ON

Watch the video

Looking ahead with prompt engineering

Organizations are just beginning to scratch the surface of what generative AI can do. This means that prompt engineering is not going anywhere. The discipline has immense potential for further advancements and applications, but there is a significant need for ongoing research, collaboration and ethical practices that will shape the future of prompt engineering.

Looking ahead, automated methods for generating effective prompts, which are currently manually crafted, might emerge. In addition, to enhance explainability of decisions and build trust, prompt engineers will need to build interpretability into their prompts, which will enable AI models to explain the outputs generated.

In terms of mitigating the challenges of bias and fairness, a major societal concern when it comes to AI, these should be factored in by design when generating the prompts—just like building in security features at the design stage of product development—and engineers should incorporate diverse perspectives into prompts.

Finally, the importance of human involvement can’t be overstated in this latest evolution of AI. If generative AI is a symphony, then prompt engineers are the conductor. One can’t function without the other. Human-algorithm collaboration, in addition to regulation, will be vital in generating the prompts needed to navigate the relatively unknown waters of an AI-enabled future.